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Molecular Evolution of the SARS 
Coronavirus During the Course of 
the SARS Epidemic in China 

The Chinese SARS Molecular Epidemiology Consortium* 

Sixty-one SARS coronavirus genomic sequences derived from the early, middle, and 
late phases of the severe acute respiratory syndrome (SARS) epidemic were an¬ 
alyzed together with two viral sequences from palm civets. Genotypes character¬ 
istic of each phase were discovered, and the earliest genotypes were similar to the 
animal SARS-like coronaviruses. Major deletions were observed in the Orf8 region 
of the genome, both at the start and the end of the epidemic. The neutral 
mutation rate of the viral genome was constant but the amino acid substitution 
rate of the coding sequences slowed during the course of the epidemic. The 
spike protein showed the strongest initial responses to positive selection pres¬ 
sures, followed by subsequent purifying selection and eventual stabilization. 


Severe acute respiratory syndrome (SARS) 
first emerged in Guangdong Province, China. 
Subsequently, the SARS coronavirus (SARS- 
CoV) was identified as the causative agent 
(1-5). It remains a challenge to establish the 
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relationship between observed genomic vari¬ 
ations and the biology of SARS (4-8). Recent 
molecular epidemiological studies have iden¬ 
tified characteristic variant sequences in 
SARS-CoV for tracking disease transmission 
(7, 9-11). Evidence suggests that SARS-CoV 
emerged from nonhuman sources (8, 12). In 
this study, we sought epidemiological and 
genetic evidence for viral adaptation to hu¬ 
man beings through molecular investigations 
of the characteristic viral lineages found in 
China (13). 

On the basis of epidemiological investi¬ 
gations (14), we divided the course of the 
epidemic into early, middle, and late phases 
(Fig. 1). The early phase is defined as the 
period from the first emergence of SARS to 
the first documented superspreader event 
(SSE) (13). The middle phase refers to the 
ensuing events up to the first cluster of 
SARS cases in a hotel (Hotel M) in Hong 
Kong (15). Cases following this cluster fall 
into the late phase. 

The early phase was initially character¬ 
ized by a series of seemingly independent 
cases. Eleven index cases that had arisen 
locally in the absence of any contact history 
were identified from different geographical 
locations within Guangdong Province (fig. 
SI). This phenomenon was observed from 
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the retrospectively identified SARS index pa¬ 
tient from the city of Foshan (onset date, 16 
November 2002) (13) through to an index 
patient from the city of Dongguan (onset 
date, 10 March 2003). All of these cases were 
confined to regions directly west of Guang¬ 
zhou, the capital city of Guangdong Province, 
and to the city of Shenzhen in the south, with 
no cases being reported to the north or east of 
Guangzhou (Fig. 1) (fig. SI). This region, the 
Pearl River Delta, has enjoyed rapid econom¬ 
ic development since the late 1970s, leading 
to the adoption of culinary habits requiring 
exotic animals. Seven of these 11 cases had 
documented contact with wild animals. In 
contrast to the apparently independent seed¬ 
ing of the earliest cases, the rest of the epi¬ 
demic was characterized by SSEs and clus¬ 
ters of cases that were epidemiologically 
linked (Fig. 1) (fig. SI) (10, 11, 13, 15, 16). 

The first major SARS outbreak occurred 
in a hospital, HZS-2, in the city of Guang¬ 
zhou, beginning on 31 January 2003 where 
an SSE was identified to be associated with 
more than 130 primary and secondary infec¬ 
tions, of which 106 were hospital-acquired 
cases. Doctor A, a nephrologist who worked 
in this hospital, visited Hong Kong and 
stayed in Hotel M on 21 February 2003. 
Other visitors to the hotel later became in¬ 
fected with SARS-CoV (13, 15). This led to 
the transmission of SARS to Vietnam, Can¬ 
ada, Singapore, and the United States (17) 
with two further SSEs in Hong Kong, each 
resulting in the virus being transmitted to 
>100 contacts (10, 16). 

Genomic sequence data for SARS-CoV 
were largely derived from isolates linked to 
the Hotel M cluster (f>), hence they were 
predominantly from the late phase of the 
epidemic. We determined 29 SARS-CoV 
genomic sequences obtained from 22 patients 
from Guangdong Province with disease onset 
dates in all three phases of the epidemic, and 
from two patients from the late phase in Hong 
Kong. To eliminate mutational noise, we as¬ 
sumed that sequence variants associated with 
common ancestry, but not arising in cell cul¬ 
ture, should be seen in multiple isolates (7). 
Meanwhile, critical genomic variations or com¬ 
plete genome sequences of certain virus isolates 
were verified by sequencing the reverse tran¬ 
scription polymerase chain reaction (RT-PCR) 
products derived directly from patient speci¬ 
mens (14). The genomic sequences obtained 
were compared with 32 human SARS-CoV 
sequences and two SARS-like coronavirus se¬ 
quences from Himalayan palm civets (Pagiima 
larvata) available at GenBank as of the end of 
September 2003 (Fig. 2). 

Only two major genotypes predominated 
during the early phase of the epidemic. Five 
isolates were found to contain a 29- 
nucleotide (nt) sequence that is absent in 
most of the publicly available SARS-CoV 
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sequences, whereas another four isolates 
showed a previously unreported 82-nt dele¬ 
tion in the same region of the genome, Orf8 
(18) (fig. S2 and table SI). The former se¬ 
quence is represented by the GZ02 isolate [all 
GenBank accession numbers are listed in 
(14)) and is used as the reference for anno¬ 
tation throughout this study. All of the iso¬ 
lates exhibiting this sequence (GZ02, 
HGZ8L1-A, HSZ-A, HSZ-B, and HSZ-C; 
Fig. 2) were obtained from patients with 
contact histories traceable to some of the 
earliest independent cases in Guangzhou 
and were not detected in any of the later 
isolates. It is noteworthy that this sequence 
with the 29-nt segment is identical to the 
genomic sequence of coronaviruses isolat¬ 
ed from animals in a Shenzhen live animal 
market (8). 

Three of the SARS-CoV genome se¬ 
quences (ZS-A, ZS-B, and ZS-C; Fig. 2) 
with the 82-nt deletion were obtained from 
samples of very early cases from Zhongshan 
city. This 82-nt deletion was further con¬ 
firmed by RT-PCR directly on an additional 
stool sample. A sequence with an identical 
82-nt deletion has also been observed in coro¬ 
naviruses isolated from farmed civets in Hu- 
bei Province, China (19). It is thus interesting 
to note that both sequences of the early 
phase were identified from other mamma¬ 
lian hosts. They provided a link to support 
the notion that early human infection of 
SARS-CoV may have originated from wild 
animals (8, 12). 

In contrast to the early phase, a SARS- 
CoV sequence with the 29-nt deletion was 
observed during the middle phase that dom¬ 
inated the viral population for the rest of the 
epidemic (4, 5, 7). Although this shift in 
genome size might be due to chance, deletion 


events appeared to be overrepresented in the 
Orf8 region. A fourth sequence with the 82-nt 
deletion was obtained from a Guangzhou pa¬ 
tient (HGZ8L1-B), who was infected in the 
same ward as one of the patients where the 
longest sequence was obtained (HGZ8L1-A) 
(see above). Furthermore, a lung biopsy of a 
patient from the middle phase was found to 
contain two SARS-CoV genotypes, with the 
29-nt and the 82-nt deletions, respectively 
(fig. S3). Remarkably, another genotype with 
a 415-nt deletion resulting in the loss of the 
whole Orf8 region was isolated and con¬ 
firmed in two Hong Kong patients with dis¬ 
ease onset from mid-May 2003 (Fig. 2) (fig. 
S2) (20). 

Because the majority of deletions ob¬ 
served in the SARS-CoV genome occurred 
in the Orf8 region with no apparent effect 
on the survival of the virus, it is tempting to 
suggest that this region is either noncoding 
or coding for a functionally unimportant 
putative protein (table SI). On the other 
hand, it is interesting to note that antipar¬ 
allel reverse symmetrical sequences were 
readily predicted around the deletion sites 
(fig. S2), which might account for the high 
deletion rates in this region. Whether such 
hairpin structures actually play a role in 
regulating either RNA replication or 
mRNA transcription in SARS-CoV is a 
subject for future studies. 

Besides the deletion variants, 299 single¬ 
nucleotide variations (SNVs) were detected 
among the 63 sequences. Eighty-five of these 
variant loci were seen in more than one of the 
human SARS-CoV sequences. Among them, 
52 were predicted to cause amino acid chang¬ 
es (nonsynonymous variations) (table S2). 
When the epidemiologically determined 
transmission paths and SNV genotype data 


are combined, markers for genotypes charac¬ 
teristic of different lineages are evident (Fig. 
2) (table S2). 

Viruses of the early phase have the char¬ 
acteristic motif of G:A:C:G:C at the GZ02 
reference nucleotide residues 17,564, 21,721, 
22,222, 23,823, and 27,827, with the bold 
SNVs matching the C:G:C:C motif identified 
previously (7) (Fig. 2). This motif is shared 
by almost all early Guangzhou and Zhong¬ 
shan isolates together with the animal SARS- 
like coronavirus isolates (SZ3 and SZ16) (8). 
Along with the disappearance of viruses con¬ 
taining the 29-nt segment, the middle phase 
of the epidemic was characterized by the 
occurrence of genotypes with the G:A:C:T:C 
motif (Fig. 2). All of the middle-phase geno¬ 
types demonstrate this common motif but can 
be further classified into two variant groups 
on the basis of other SNVs (table S2). One 
group was represented by the isolates related 
to the Hospital HZS-2 outbreak (HZS2-A, 
HZS2-B, HZS2-C, HZS2-D, HZS2-E, and 
HGZ8L-2). The other group was represented 
by the Hong Kong CUHK-W1 isolate that 
originated from Shenzhen (9) together with 
the early Beijing isolates BJ01, BJ02, and 
BJ03, traceable to Guangdong. The transition 
between the characteristic motifs of the early 
and middle phases represented a G^>T trans¬ 
version at nucleotide residue 23,823 and is 
predicted to cause an Asp —* Tyr change at 
amino acid residue 778 of the spike (S) pro¬ 
tein (fig. S4). 

An additional A^G transition at nucleo¬ 
tide 21,721 (Fig. 2) (fig. S4) was identified in 
one isolate from a secondarily infected pa¬ 
tient from Hospital HZS-2 with disease onset 
on 7 February 2003 (HZS2-Fc) (Fig. 2). This 
sequence was additionally confirmed by di¬ 
rect sequencing of the RT-PCR product from 


2002 

2003 



Jan.31 

Feb.21 



Fig. 1 . The triphasic SARS epidemic in Guangdong Province, China. Shown 
are daily numbers of SARS cases reported in Guangdong Province, in 
particular the city of Guangzhou. The early, middle, and late phases of the 


epidemic are defined in the text. The map shows the geographical distribu¬ 
tion of cases belonging to the early phase by administrative districts of Guang¬ 
dong Province. The detailed data for individual cities are presented in fig. SI. 
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an oropharyngeal swab of this patient (HZS2- 
Fb). This mutation is predicted to cause an 
Asp 77 —> Gly amino acid switch in the S 
protein (fig. S4), and the G:G:C:T:C motif is 
so far genotypically the closest sequence to 
that of the Hotel M outbreak (T:G:T:T:T) 
(Fig. 2) (15). Epidemiologically, this patient 
is potentially linked to the Hotel M outbreak 
through her contact with Doctor A during the 
first 3 days of illness. Thus, Doctor A was 
possibly infected with this viral variant. 

Additionally, one G^T transversion and 
two C^T transitions at nucleotide residues 
17,564, 22,222, and 27,827 are observed in 
the Hotel M-associated SARS-CoV geno¬ 
types (Fig. 2) (table S2). These SNVs are 
predicted to cause amino acid switches in the 
nonstructural polyprotein (Glu 1389 —* Asp), 
the S protein (Thr 244 —» lie), and Orf8a 


(Arg 17 —»Cys), respectively. This T:G:T:T:T 
motif is shared by the sequences of all iso¬ 
lates infected from and after the Hotel M 
cluster (7), including the Hong Kong Amoy 
Gardens isolates (10) and the more recent 
isolates from Zhejiang (ZJ01), Taiwan (11), 
and Guangdong (GZ-B, GZ-C, and GZ-D) 
(Fig. 2) (table S2). This motif is also con¬ 
served in the late 415-nt deletion variant in 
Hong Kong with the exception of nucleotide 
27,827, which falls within the deleted seg¬ 
ment (20). Thus, surprisingly few genotypes 
predominated during the late phase of 
the epidemic. 

The characteristically high mutation rate 
of RNA viruses (21) may give rise to strains 
with increased virulence (22) that can either 
escape host defenses (23) or change their 
tissue tropism (24). We noticed that the neu¬ 


tral mutation rate for SARS-CoV during this 
epidemic was almost constant (fig. S5) (14) 
and was estimated to be 8.26 X 10~ 6 
(±2.16 X 10 6 ) nr 1 day -1 . This is similar to 
the values obtained for known RNA viruses 
and is about one-third that for the human 
immunodeficiency virus (25, 26). In contrast 
to the constant rate of synonymous varia¬ 
tions, the nonsynonymous mutation rates 
were variable for the three epidemic phases 
(table S3) (14). The predicted domains of the 
S protein, responsible for viral host receptor 
recognition or internalization (27), were 
those that underwent the most extensive ami¬ 
no acid substitutions (fig. S4). 

Between the coronavirus sequences of the 
palm civets (SZ3 or SZ16) and each of the 
human SARS-CoV sequences, the ratios of 
the rates of nonsynonymous to synonymous 
changes (Ka/Ks) for the S gene sequences 
were always greater than 1, indicating an 
overall positive selection pressure. However, 
pairwise analysis of the Ka/Ks for the geno¬ 
types in each epidemic group (fig. S6) (14) 
shows that the average Ka/Ks for the early 
phase was significantly larger than that for 
the middle phase, which in turn was signifi¬ 
cantly larger than the ratio for the late phase, 
which in fact was significantly less than 1 
(table S3). These data indicate that the S gene 
showed the strongest positive selection pres¬ 
sures initially, with subsequent purifying se¬ 
lections and eventual stabilization. For Orfla, 
we observed a pattern similar to that for the S 
gene (table S3). In contrast, Orflb (nt coor¬ 
dinate: 13,398 to 21,485) seems to be under¬ 
going purifying selection during the whole 
course of the epidemic. Indeed, it is the most 
conserved genomic region of SARS-CoV (7). 

Our analysis thus suggests that adaptive 
pressures operated on the SARS-CoV ge¬ 
nome but stabilized during the late phase of 
the epidemic with the emergence of a pre¬ 
dominant genotype. Alternatively, sampling 
bias for cases related to SSEs (28) may distort 
the data. We believe that such a sampling 
strategy may be justifiable from a public 
health perspective, as the viral genotypes as¬ 
sociated with the SSEs are the most epidemi¬ 
ologically important. However, to explore the 
possibility of bias, we estimated the date for 
the most recent common ancestor of the sam¬ 
ples available. On the basis of the observed 
neutral mutation rate, this date was estimated 
to lie in mid-November 2002 (95% confi¬ 
dence interval: early June 2002 and late De¬ 
cember 2002) (14). This result is consistent 
with the onset date of 16 November 2002 for 
the earliest index patient from Foshan (13) 
and supports the finding that the genotypes 
we studied from the early, middle, and late 
phases represent different stages of evolution 
of the same viral lineage. This is further 
evident from the remarkable correlation be¬ 
tween the molecular clustering and epidemi- 
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Fig. 2. Genotype clustering of SARS-CoV during the course of the epidemic. An unrooted 
phylogenetic tree of SARS-CoV is constructed from 61 human SARS-CoV genomes and two 
SARS-like coronavirus sequences from palm civets. Only those variant sequences (including 
deletions) that were present in at least two independent samples were used for tree construction 
(table S2). The map distance between individual sequences represents the extent of genotypic 
difference. The 5-nt motifs (see text) that characterized the phylogenetically related genotypes are 
boxed. The genomic sequences are named in concordance with their GenBank nomenclature and 
are represented in different colors according to the genotype clusters determined by our scoring 
method (table S2). Genotypes with major deletions are marked specifically (see text). All other 
genotypes (unmarked) had the 29-nt deletion. This 29-nt deletion was specifically marked for three 
genotypes, namely GZ-A, JMD, and GZ50, to indicate their special clustering within the early- 
phase isolates. 


1668 


12 MARCH 2004 VOL 303 SCIENCE www.sciencemag.org 





















Reports 


ological grouping of the genotypes through¬ 
out the epidemic (Fig. 2) (table S2). 

In tracing the molecular evolution of 
SARS-CoV in China, we observed that the 
epidemic started and ended with deletion 
events, together with a progressive slowing 
of the nonsynonymous mutation rates and a 
common genotype that predominated during 
the latter part of the epidemic. The mecha¬ 
nistic explanation for the selective adaptation 
and purification processes that led to such 
genomic evolutionary changes in SARS-CoV 
requires further work (29). Nonetheless, this 
study has provided valuable clues to aid fur¬ 
ther investigation of this remarkable evolu¬ 
tionary tale. 

We have sequenced the complete S gene 
(GenBank accession number AY525636) 
from an oropharyngeal swab sample (sam¬ 
pling date, 22 December 2003) collected 
from the most recent index patient of the city 
of Guangzhou (onset date, 16 December 
2003; hospitalized 20 December 2003; 
www.wpro.who.int/sars/docs/pressreleases/ 
pr_27122003.asp). Phylogenetic analysis of 
this S gene sequence with those from the 
human SARS-CoV and palm civet SARS- 
like coronavims indicated that this most re¬ 
cent case of SARS-CoV is much closer to the 
palm civet SARS-like coronavirus than to 
any human SARS-CoV detected in the pre¬ 
vious epidemic (fig. S7 and table S4). Be¬ 
cause it is evidently different from the recent 
laboratory infections in Singapore (www. 
who.int/csr/don/2003_09_24/en) and Taiwan 
(www.who.int/mediacentre/releases/2003/ 
np26/en), it strengthens the argument for 
animal origin of the human SARS epidemic. 
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ated cells for use in research, with potential 
applications in tissue repair and transplan¬ 
tation medicine. This concept, known as 
“therapeutic cloning,” refers to the transfer 
of the nucleus of a somatic cell into an 
enucleated donor oocyte (3). In theory, the 
oocyte’s cytoplasm would reprogram the 
transferred nucleus by silencing all the so¬ 
matic cell genes and activating the embry¬ 
onic ones. ES cells would be isolated from 
the inner cell mass (ICM) of the cloned 
preimplantation embryo. When applied in a 
therapeutic setting, these cells would carry 
the nuclear genome of the patient; there¬ 
fore, it is proposed that after directed cell 
differentiation, the cells could be trans¬ 
planted without immune rejection to treat 
degenerative disorders such as diabetes, 
osteoarthritis, and Parkinson’s disease 
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Derived from a Cloned Blastocyst 

Woo Suk Hwang, 12 * Young June Ryu , 1 Jong Hyuk Park , 3 
Eul Soon Park , 1 Eu Gene Lee , 1 Ja Min Koo , 4 Hyun Yong Jeon , 1 
Byeong Chun Lee , 1 Sung Keun Kang , 1 Sun Jong Kim , 3 Curie Ahn , 5 
Jung Hye Hwang , 6 Ky Young Park , 7 Jose B. Cibelli , 8 
Shin Yong Moon 5 * 

Somatic cell nuclear transfer (SCNT) technology has recently been used to generate 
animals with a common genetic composition. In this study, we report the derivation 
of a pluripotent embryonic stem (ES) cell line (SCNT-hES-1) from a cloned human 
blastocyst. The SCNT-hES-1 cells displayed typical ES cell morphology and cell surface 
markers and were capable of differentiating into embryoid bodies in vitro and of 
forming teratomas in vivo containing cell derivatives from all three embryonic germ 
layers in severe combined immunodeficient mice. After continuous proliferation for 
more than 70 passages, SCNT-hES-1 cells maintained normal karyotypes and were 
genetically identical to the somatic nuclear donor cells. Although we cannot completely 
exclude the possibility that the cells had a parthenogenetic origin, imprinting analyses 
support a SCNT origin of the derived human ES cells. 
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